Phrase Translation Probabilities with ITG Priors and Smoothing as Learning Objective
نویسندگان
چکیده
The conditional phrase translation probabilities constitute the principal components of phrase-based machine translation systems. These probabilities are estimated using a heuristic method that does not seem to optimize any reasonable objective function of the word-aligned, parallel training corpus. Earlier efforts on devising a better understood estimator either do not scale to reasonably sized training data, or lead to deteriorating performance. In this paper we explore a new approach based on three ingredients (1) A generative model with a prior over latent segmentations derived from Inversion Transduction Grammar (ITG), (2) A phrase table containing all phrase pairs without length limit, and (3) Smoothing as learning objective using a novel Maximum-A-Posteriori version of Deleted Estimation working with Expectation-Maximization. Where others conclude that latent segmentations lead to overfitting and deteriorating performance, we show here that these three ingredients give performance equivalent to the heuristic method on reasonably sized training data.
منابع مشابه
Semantic smoothing and fabrication of phrase pairs for SMT
In statistical machine translation systems, phrases with similar meanings often have similar but not identical distributions of translations. This paper proposes a new soft clustering method to smooth the conditional translation probabilities for a given phrase with those of semantically similar phrases. We call this semantic smoothing (SS). Moreover, we fabricate new phrase pairs that were not...
متن کاملPhrase Clustering for Smoothing TM Probabilities - or, How to Extract Paraphrases from Phrase Tables
This paper describes how to cluster together the phrases of a phrase-based statistical machine translation (SMT) system, using information in the phrase table itself. The clustering is symmetric and recursive: it is applied both to sourcelanguage and target-language phrases, and the clustering in one language helps determine the clustering in the other. The phrase clusters have many possible us...
متن کاملImproving word alignment for low resource languages using English monolingual SRL
We introduce a new statistical machine translation approach specifically geared to learning translation from low resource languages, that exploits monolingual English semantic parsing to bias inversion transduction grammar (ITG) induction. We show that in contrast to conventional statistical machine translation (SMT) training methods, which rely heavily on phrase memorization, our approach focu...
متن کاملA semantically confidence-weighted ITG induction algorithm
We propose a new algorithm to induce inversion transduction grammars, in which a crosslingual semantic frame based objective function is injected as confidence weighting in the early stages of statistical machine translation training. Unlike recent work on improving translation adequacy that uses a monolingual semantic frame based objective function to drive the tuning of loglinear mixture weig...
متن کاملImproved Discriminative ITG Alignment using Hierarchical Phrase Pairs and Semi-supervised Training
While ITG has many desirable properties for word alignment, it still suffers from the limitation of one-to-one matching. While existing approaches relax this limitation using phrase pairs, we propose a ITG formalism, which even handles units of non-contiguous words, using both simple and hierarchical phrase pairs. We also propose a parameter estimation method, which combines the merits of both ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008